Overview

Dataset statistics

Number of variables14
Number of observations281880
Missing cells44736
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.1 MiB
Average record size in memory112.0 B

Variable types

Categorical4
DateTime1
Numeric9

Alerts

VERSIE has constant value "1.0" Constant
DATUM_BESTAND has constant value "2021-09-13" Constant
PEILDATUM has constant value "2021-09-01" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1770 distinct values High cardinality
BEHANDELEND_SPECIALISME_CD is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with BEHANDELEND_SPECIALISME_CD and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
DATUM_BESTAND is highly correlated with PEILDATUM and 1 other fieldsHigh correlation
PEILDATUM is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
VERSIE is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
JAAR is highly correlated with AANTAL_PAT_PER_SPC and 1 other fieldsHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with JAAR and 1 other fieldsHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 44736 (15.9%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.2566906) Skewed

Reproduction

Analysis started2021-10-07 18:35:12.584758
Analysis finished2021-10-07 18:35:36.596409
Duration24.01 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

VERSIE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
1.0
281880 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters845640
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0281880
100.0%

Length

2021-10-07T18:35:36.651799image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-07T18:35:36.879221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0281880
100.0%

Most occurring characters

ValueCountFrequency (%)
1281880
33.3%
.281880
33.3%
0281880
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number563760
66.7%
Other Punctuation281880
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1281880
50.0%
0281880
50.0%
Other Punctuation
ValueCountFrequency (%)
.281880
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common845640
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1281880
33.3%
.281880
33.3%
0281880
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII845640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1281880
33.3%
.281880
33.3%
0281880
33.3%

DATUM_BESTAND
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
2021-09-13
281880 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2818800
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-09-13
2nd row2021-09-13
3rd row2021-09-13
4th row2021-09-13
5th row2021-09-13

Common Values

ValueCountFrequency (%)
2021-09-13281880
100.0%

Length

2021-10-07T18:35:36.944255image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-07T18:35:37.010654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2021-09-13281880
100.0%

Most occurring characters

ValueCountFrequency (%)
2563760
20.0%
0563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
10.0%
3281880
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2255040
80.0%
Dash Punctuation563760
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2563760
25.0%
0563760
25.0%
1563760
25.0%
9281880
12.5%
3281880
12.5%
Dash Punctuation
ValueCountFrequency (%)
-563760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2818800
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2563760
20.0%
0563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
10.0%
3281880
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2818800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2563760
20.0%
0563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
10.0%
3281880
10.0%

PEILDATUM
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
2021-09-01
281880 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2818800
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-09-01
2nd row2021-09-01
3rd row2021-09-01
4th row2021-09-01
5th row2021-09-01

Common Values

ValueCountFrequency (%)
2021-09-01281880
100.0%

Length

2021-10-07T18:35:37.074581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-07T18:35:37.140352image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2021-09-01281880
100.0%

Most occurring characters

ValueCountFrequency (%)
0845640
30.0%
2563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2255040
80.0%
Dash Punctuation563760
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0845640
37.5%
2563760
25.0%
1563760
25.0%
9281880
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
-563760
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2818800
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0845640
30.0%
2563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2818800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0845640
30.0%
2563760
20.0%
1563760
20.0%
-563760
20.0%
9281880
 
10.0%

JAAR
Date

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Minimum2012-01-01 00:00:00
Maximum2021-01-01 00:00:00
2021-10-07T18:35:37.195019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:37.284863image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean424.6211189
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:37.398460image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation933.4165885
Coefficient of variation (CV)2.198234018
Kurtosis69.20096538
Mean424.6211189
Median Absolute Deviation (MAD)8
Skewness8.431478186
Sum119692201
Variance871266.5277
MonotonicityNot monotonic
2021-10-07T18:35:37.513037image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
30539908
14.2%
31336584
13.0%
30332463
11.5%
33022561
 
8.0%
31619192
 
6.8%
30814676
 
5.2%
30611727
 
4.2%
32411699
 
4.2%
30111461
 
4.1%
3049231
 
3.3%
Other values (17)72378
25.7%
ValueCountFrequency (%)
30111461
 
4.1%
3026166
 
2.2%
30332463
11.5%
3049231
 
3.3%
30539908
14.2%
30611727
 
4.2%
3074900
 
1.7%
30814676
 
5.2%
3103165
 
1.1%
31336584
13.0%
ValueCountFrequency (%)
84183784
 
1.3%
1900186
 
0.1%
390759
 
0.3%
3893032
 
1.1%
3623977
 
1.4%
3611997
 
0.7%
3352878
 
1.0%
33022561
8.0%
329748
 
0.3%
3286001
 
2.1%

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1770
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
101
 
1191
402
 
1165
403
 
1134
301
 
1125
203
 
1066
Other values (1765)
276199 

Length

Max length4
Median length3
Mean length3.349155669
Min length2

Characters and Unicode

Total characters944060
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row2401
2nd row2401
3rd row2405
4th row2405
5th row2405

Common Values

ValueCountFrequency (%)
1011191
 
0.4%
4021165
 
0.4%
4031134
 
0.4%
3011125
 
0.4%
2031066
 
0.4%
2011060
 
0.4%
401947
 
0.3%
404946
 
0.3%
802926
 
0.3%
409920
 
0.3%
Other values (1760)271400
96.3%

Length

2021-10-07T18:35:37.642713image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1011191
 
0.4%
4021165
 
0.4%
4031134
 
0.4%
3011125
 
0.4%
2031066
 
0.4%
2011060
 
0.4%
401947
 
0.3%
404946
 
0.3%
802926
 
0.3%
409920
 
0.3%
Other values (1760)271400
96.3%

Most occurring characters

ValueCountFrequency (%)
1180710
19.1%
0172605
18.3%
2125052
13.2%
3102527
10.9%
572545
7.7%
968321
 
7.2%
467275
 
7.1%
755594
 
5.9%
649310
 
5.2%
840539
 
4.3%
Other values (15)9582
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number934478
99.0%
Uppercase Letter9582
 
1.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G1798
18.8%
M1583
16.5%
B1151
12.0%
E824
8.6%
Z772
8.1%
D650
 
6.8%
A627
 
6.5%
F610
 
6.4%
C314
 
3.3%
K308
 
3.2%
Other values (5)945
9.9%
Decimal Number
ValueCountFrequency (%)
1180710
19.3%
0172605
18.5%
2125052
13.4%
3102527
11.0%
572545
7.8%
968321
 
7.3%
467275
 
7.2%
755594
 
5.9%
649310
 
5.3%
840539
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common934478
99.0%
Latin9582
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G1798
18.8%
M1583
16.5%
B1151
12.0%
E824
8.6%
Z772
8.1%
D650
 
6.8%
A627
 
6.5%
F610
 
6.4%
C314
 
3.3%
K308
 
3.2%
Other values (5)945
9.9%
Common
ValueCountFrequency (%)
1180710
19.3%
0172605
18.5%
2125052
13.4%
3102527
11.0%
572545
7.8%
968321
 
7.3%
467275
 
7.2%
755594
 
5.9%
649310
 
5.3%
840539
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII944060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1180710
19.1%
0172605
18.3%
2125052
13.2%
3102527
10.9%
572545
7.7%
968321
 
7.2%
467275
 
7.1%
755594
 
5.9%
649310
 
5.2%
840539
 
4.3%
Other values (15)9582
 
1.0%

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5933
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean439730590.5
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:37.775917image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999037
Q199799028
median149599019
Q3990004004
95-th percentile990516015
Maximum998418081
Range987917079
Interquartile range (IQR)890204976

Descriptive statistics

Standard deviation428873091.2
Coefficient of variation (CV)0.9753087468
Kurtosis-1.732662051
Mean439730590.5
Median Absolute Deviation (MAD)119600013
Skewness0.4724661679
Sum1.239512589 × 1014
Variance1.839321284 × 1017
MonotonicityNot monotonic
2021-10-07T18:35:37.917449image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9900040092089
 
0.7%
9900040072049
 
0.7%
9900030041975
 
0.7%
9900040061637
 
0.6%
9903560761483
 
0.5%
9903560731359
 
0.5%
9900030071274
 
0.5%
1319992281270
 
0.5%
1319991641249
 
0.4%
1992990131188
 
0.4%
Other values (5923)266307
94.5%
ValueCountFrequency (%)
105010027
< 0.1%
1050100310
< 0.1%
1050100410
< 0.1%
1050100510
< 0.1%
105010073
 
< 0.1%
1050100810
< 0.1%
1050101010
< 0.1%
105010113
 
< 0.1%
111010029
< 0.1%
1110100310
< 0.1%
ValueCountFrequency (%)
998418081135
< 0.1%
998418080122
< 0.1%
99841807935
 
< 0.1%
9984180777
 
< 0.1%
9984180767
 
< 0.1%
9984180756
 
< 0.1%
998418074186
0.1%
998418073186
0.1%
9984180727
 
< 0.1%
9984180717
 
< 0.1%

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9326
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean503.0807365
Minimum1
Maximum163749
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:38.056108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median13
Q3100
95-th percentile1688
Maximum163749
Range163748
Interquartile range (IQR)97

Descriptive statistics

Standard deviation3142.423395
Coefficient of variation (CV)6.246360012
Kurtosis405.490553
Mean503.0807365
Median Absolute Deviation (MAD)12
Skewness16.75773128
Sum141808398
Variance9874824.794
MonotonicityNot monotonic
2021-10-07T18:35:38.189723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
147009
 
16.7%
223023
 
8.2%
315075
 
5.3%
411030
 
3.9%
58493
 
3.0%
67268
 
2.6%
76025
 
2.1%
85067
 
1.8%
94686
 
1.7%
104129
 
1.5%
Other values (9316)150075
53.2%
ValueCountFrequency (%)
147009
16.7%
223023
8.2%
315075
 
5.3%
411030
 
3.9%
58493
 
3.0%
67268
 
2.6%
76025
 
2.1%
85067
 
1.8%
94686
 
1.7%
104129
 
1.5%
ValueCountFrequency (%)
1637491
< 0.1%
1558701
< 0.1%
1542721
< 0.1%
1450111
< 0.1%
1447261
< 0.1%
1169841
< 0.1%
1156051
< 0.1%
1102081
< 0.1%
1096771
< 0.1%
1089591
< 0.1%

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct9999
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean591.3416241
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:38.327772image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3109
95-th percentile1915
Maximum239907
Range239906
Interquartile range (IQR)106

Descriptive statistics

Standard deviation4004.588702
Coefficient of variation (CV)6.772039273
Kurtosis717.701999
Mean591.3416241
Median Absolute Deviation (MAD)13
Skewness21.2566906
Sum166687377
Variance16036730.67
MonotonicityNot monotonic
2021-10-07T18:35:38.468839image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
145329
 
16.1%
222637
 
8.0%
314933
 
5.3%
410829
 
3.8%
58441
 
3.0%
67234
 
2.6%
75980
 
2.1%
85025
 
1.8%
94638
 
1.6%
104104
 
1.5%
Other values (9989)152730
54.2%
ValueCountFrequency (%)
145329
16.1%
222637
8.0%
314933
 
5.3%
410829
 
3.8%
58441
 
3.0%
67234
 
2.6%
75980
 
2.1%
85025
 
1.8%
94638
 
1.6%
104104
 
1.5%
ValueCountFrequency (%)
2399071
< 0.1%
2324841
< 0.1%
2313901
< 0.1%
2276581
< 0.1%
2215211
< 0.1%
2186231
< 0.1%
2164241
< 0.1%
2127091
< 0.1%
2086341
< 0.1%
2047481
< 0.1%

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8203
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7563.262108
Minimum1
Maximum226763
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:38.610064image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile38
Q1381
median1660
Q36160
95-th percentile36249
Maximum226763
Range226762
Interquartile range (IQR)5779

Descriptive statistics

Standard deviation17744.97243
Coefficient of variation (CV)2.346206197
Kurtosis34.07260824
Mean7563.262108
Median Absolute Deviation (MAD)1516
Skewness5.080534233
Sum2131932323
Variance314884046.4
MonotonicityNot monotonic
2021-10-07T18:35:38.748355image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21460
 
0.2%
19458
 
0.2%
8454
 
0.2%
12450
 
0.2%
37449
 
0.2%
9442
 
0.2%
28440
 
0.2%
17422
 
0.1%
23416
 
0.1%
4416
 
0.1%
Other values (8193)277473
98.4%
ValueCountFrequency (%)
1334
0.1%
2363
0.1%
3379
0.1%
4416
0.1%
5372
0.1%
6390
0.1%
7341
0.1%
8454
0.2%
9442
0.2%
10318
0.1%
ValueCountFrequency (%)
22676323
< 0.1%
21350925
< 0.1%
21203817
< 0.1%
21080417
< 0.1%
21044019
< 0.1%
20545824
< 0.1%
20467317
< 0.1%
20017916
< 0.1%
19853420
< 0.1%
18911119
< 0.1%

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9074
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10796.6829
Minimum1
Maximum366095
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:39.053187image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1499
median2276
Q38777
95-th percentile51048
Maximum366095
Range366094
Interquartile range (IQR)8278

Descriptive statistics

Standard deviation26171.16459
Coefficient of variation (CV)2.424000486
Kurtosis37.99419026
Mean10796.6829
Median Absolute Deviation (MAD)2096
Skewness5.338255862
Sum3043368975
Variance684929855.9
MonotonicityNot monotonic
2021-10-07T18:35:39.194313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13376
 
0.1%
17369
 
0.1%
38361
 
0.1%
4356
 
0.1%
19355
 
0.1%
25351
 
0.1%
24339
 
0.1%
82338
 
0.1%
23336
 
0.1%
46335
 
0.1%
Other values (9064)278364
98.8%
ValueCountFrequency (%)
1283
0.1%
2299
0.1%
3327
0.1%
4356
0.1%
5321
0.1%
6318
0.1%
7314
0.1%
8309
0.1%
9255
0.1%
10320
0.1%
ValueCountFrequency (%)
36609523
< 0.1%
34846025
< 0.1%
34170819
< 0.1%
32380020
< 0.1%
32084924
< 0.1%
31187317
< 0.1%
30971317
< 0.1%
29771717
< 0.1%
28841616
< 0.1%
26704219
< 0.1%

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct269
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean660918.4521
Minimum227
Maximum1489503
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:39.342999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum227
5-th percentile42210
Q1246242
median744717
Q3995483
95-th percentile1332889
Maximum1489503
Range1489276
Interquartile range (IQR)749241

Descriptive statistics

Standard deviation426367.1097
Coefficient of variation (CV)0.6451130367
Kurtosis-1.186683755
Mean660918.4521
Median Absolute Deviation (MAD)326629
Skewness0.02661943634
Sum1.862996933 × 1011
Variance1.817889122 × 1011
MonotonicityNot monotonic
2021-10-07T18:35:39.486097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8809685102
 
1.8%
8742904354
 
1.5%
8439934348
 
1.5%
8944144333
 
1.5%
8805604273
 
1.5%
8907134209
 
1.5%
7192124007
 
1.4%
10841713890
 
1.4%
10962213859
 
1.4%
10636893851
 
1.4%
Other values (259)239654
85.0%
ValueCountFrequency (%)
2277
 
< 0.1%
1531121
 
< 0.1%
1609130
 
< 0.1%
1923131
 
< 0.1%
2004196
0.1%
231664
 
< 0.1%
2497173
0.1%
434081
 
< 0.1%
4413297
0.1%
6811380
0.1%
ValueCountFrequency (%)
14895032976
1.1%
14506233054
1.1%
14218483564
1.3%
13452333543
1.3%
13328893546
1.3%
13288273436
1.2%
13173823463
1.2%
12967231181
 
0.4%
12830833577
1.3%
12625951201
 
0.4%

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct269
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1058151.97
Minimum230
Maximum2634761
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:39.632411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum230
5-th percentile47081
Q1356417
median1037070
Q31729107
95-th percentile2488664
Maximum2634761
Range2634531
Interquartile range (IQR)1372690

Descriptive statistics

Standard deviation745271.4756
Coefficient of variation (CV)0.7043142161
Kurtosis-0.9179410898
Mean1058151.97
Median Absolute Deviation (MAD)692037
Skewness0.323116285
Sum2.982718774 × 1011
Variance5.554295723 × 1011
MonotonicityNot monotonic
2021-10-07T18:35:39.774832image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12118135102
 
1.8%
12817524354
 
1.5%
12162924348
 
1.5%
13157154333
 
1.5%
13006344273
 
1.5%
13272174209
 
1.5%
10734774007
 
1.4%
25569843890
 
1.4%
26347613859
 
1.4%
24886643851
 
1.4%
Other values (259)239654
85.0%
ValueCountFrequency (%)
2307
 
< 0.1%
1739121
 
< 0.1%
1862130
 
< 0.1%
2039196
0.1%
2200131
 
< 0.1%
235664
 
< 0.1%
2819173
0.1%
434681
 
< 0.1%
4424297
0.1%
7390380
0.1%
ValueCountFrequency (%)
26347613859
1.4%
25946353845
1.4%
25569843890
1.4%
24886643851
1.4%
23985903708
1.3%
21844213757
1.3%
20663433810
1.4%
20289391168
 
0.4%
19854911167
 
0.4%
19785523691
1.3%

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3291
Distinct (%)1.4%
Missing44736
Missing (%)15.9%
Infinite0
Infinite (%)0.0%
Mean3499.49206
Minimum0
Maximum287220
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size2.2 MiB
2021-10-07T18:35:39.913128image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile140
Q1460
median1215
Q34015
95-th percentile13275
Maximum287220
Range287220
Interquartile range (IQR)3555

Descriptive statistics

Standard deviation6546.109176
Coefficient of variation (CV)1.870588378
Kurtosis163.0344839
Mean3499.49206
Median Absolute Deviation (MAD)990
Skewness7.67066718
Sum829883545
Variance42851545.34
MonotonicityNot monotonic
2021-10-07T18:35:40.050458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1051859
 
0.7%
1601856
 
0.7%
1101548
 
0.5%
1451439
 
0.5%
1801344
 
0.5%
3001278
 
0.5%
1901223
 
0.4%
1201223
 
0.4%
1851212
 
0.4%
1651211
 
0.4%
Other values (3281)222951
79.1%
(Missing)44736
 
15.9%
ValueCountFrequency (%)
02
 
< 0.1%
70226
 
0.1%
7576
 
< 0.1%
80361
 
0.1%
85920
0.3%
90568
 
0.2%
95686
 
0.2%
100916
0.3%
1051859
0.7%
1101548
0.5%
ValueCountFrequency (%)
2872208
< 0.1%
1489103
 
< 0.1%
1428354
< 0.1%
1221554
< 0.1%
1167653
 
< 0.1%
1097257
< 0.1%
1085707
< 0.1%
1076554
< 0.1%
1012708
< 0.1%
954657
< 0.1%

Interactions

2021-10-07T18:35:33.364418image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:20.197881image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:21.848667image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:23.423507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:25.162073image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.712886image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.428631image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.047423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:31.651856image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:33.544543image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:20.395916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.029164image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:23.607782image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:25.342168image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.891630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.618116image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.231707image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:31.831253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:33.716080image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:20.575172image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.205539image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:23.949674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:25.511967image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:27.059691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.795246image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.405790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:32.159281image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:33.893142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:20.760011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.379099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.123451image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:25.683432image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:27.231658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.974463image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.581961image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:32.331861image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:34.062582image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:20.938810image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.550023image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.294047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:25.852398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:27.399097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:29.154339image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.761254image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:32.502443image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:34.229911image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:21.113575image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.723413image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.461073image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.017431image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:27.563099image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:29.326247image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:30.930365image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:32.669556image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:34.405238image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:21.302035image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:22.901731image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.640153image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.195500image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:27.743981image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:29.511103image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:31.110105image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:32.849324image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:34.581988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:21.488330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:23.080625image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.819752image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.373053image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.083683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:29.695619image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:31.290868image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:33.026166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:34.750498image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:21.664285image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:23.248090image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:24.986368image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:26.537407image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:28.255328image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:29.869211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:31.460802image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-07T18:35:33.194592image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-07T18:35:40.188805image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-07T18:35:40.403376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-07T18:35:40.615862image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-07T18:35:40.812011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-07T18:35:40.944081image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-07T18:35:35.027781image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-07T18:35:35.542888image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-07T18:35:36.363597image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02021-09-132021-09-012012-01-0130824011999900944181874661107866NaN
11.02021-09-132021-09-012012-01-013082401199990101010181874661107866NaN
21.02021-09-132021-09-012012-01-0130824051999901422535374661107866NaN
31.02021-09-132021-09-012012-01-0130824051999901611535374661107866NaN
41.02021-09-132021-09-012012-01-0130824051999901711535374661107866NaN
51.02021-09-132021-09-012012-01-0130824051999902099535374661107866NaN
61.02021-09-132021-09-012012-01-013082405199990214040535374661107866NaN
71.02021-09-132021-09-012012-01-0130824011999902544181874661107866NaN
81.02021-09-132021-09-012017-01-01307M132010804423023133156410690915115778312925.0
91.02021-09-132021-09-012017-01-01307M132010804522331564106909151157783NaN

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2818701.02021-09-132021-09-012018-01-013163504991630064451722447081761995NaN
2818711.02021-09-132021-09-012018-01-01316350499163006510111722447081761995205.0
2818721.02021-09-132021-09-012018-01-0131635209916300691133855138447081761995NaN
2818731.02021-09-132021-09-012018-01-0131635189916300691117912592447081761995NaN
2818741.02021-09-132021-09-012018-01-013167610991630070111161514470817619952660.0
2818751.02021-09-132021-09-012018-01-013163521991630070112893744470817619952660.0
2818761.02021-09-132021-09-012018-01-01316351899163007011179125924470817619952660.0
2818771.02021-09-132021-09-012018-01-01316351799163007011108515574470817619952660.0
2818781.02021-09-132021-09-012018-01-01316352299163007011135718864470817619952660.0
2818791.02021-09-132021-09-012018-01-0131635209916300701516338551384470817619952660.0